Python Implementation of Stratified Random Sampling
Population Dataset
In this example, the population is defined as 25,000 college students.
Variables in the population dataset:
| 0 |
SID000001 |
Male |
18 |
White |
Hispanic or Latino |
Freshman |
4.00 |
| 1 |
SID000002 |
Female |
40 |
Two or more races |
Hispanic or Latino |
Continuing Education |
0.00 |
| 2 |
SID000003 |
Female |
19 |
White |
Not Hispanic or Latino |
Junior |
3.23 |
Weight Calculation for Survey Analysis
Post-Stratification Weights
Post-Stratification weights can be calculated in different ways:
Each stratum individually
Combined categories (joint distribution considered)
Note. weights can be a fraction, but are always positive and non-zero.
Today’s demonstration is similar to individual strata weighting but accounts for joint distribution.
Calculate Weights in Respondents Data
Presentation Details in Conference Program
Abstract: Stratified sampling helps ensure representative data subsets, especially when working with imbalanced groups. In this talk, we’ll explore its use in survey analysis, demonstrate a Python-based implementation, and share best practices for improving data reliability and reducing bias. Attendees will gain a clear understanding of when and how to use stratified sampling effectively in real-world scenarios.
Description: Whether you’re conducting surveys, building predictive models, or working with imbalanced datasets, stratified sampling is a powerful technique for ensuring your data is representative and your insights are reliable.
In survey research, particularly with diverse populations like college students, simple random sampling can lead to under-representation of key subgroups. Stratified sampling addresses this by dividing the population into distinct strata (e.g., age groups, majors, demographics) and sampling proportionally from each. This approach reduces sampling bias and improves the reliability of statistical estimates.
In this presentation, I will demonstrate how to use Python to implement stratified sampling in a student survey at our college. By maintaining proportional representation across age groups, we ensured that our sample reflected the diversity of the student body. I’ll walk through the conceptual foundations of stratified sampling, compare it with simple random sampling, and highlight its advantages in survey research and data analysis.
We’ll also explore how to calculate weighted means when sample proportions differ from population proportions, which is a crucial step in producing unbiased estimates. Code examples and visualizations will be shared via GitHub to support hands-on learning and reproducibility.